Instructors:
Zongxu Zhang (zongxu.zhang@stu.pku.edu.cn)
Zexian Zeng (zexianzeng@pku.edu.cn)
Affiliation:
Center for Quantitative Biology
Academy for Advanced Interdisciplinary Studies
Peking University
Lab Website:
🔗 http://cqb.pku.edu.cn/zenglab/
This materials is largely based on CellphoneDB, we thank the origional authors for their contribution.
Dynamo
A python packages for RNA velocity analysis, supplemented with differential gemotary analysis to reveal gene regulation process involved in differential process.
💡 Note: This course will primarily focus on Python-based workflows for scRNA-seq data analysis.
Cell communication is a process by which cells react to stimuli from their environment and also from themselves.
Autocrine signalling refers to intracellular communication whereby cells secrete ligands that are used to induce a cellular response through cognate receptors for those molecules expressed on the same cell.
Paracrine cell–cell communication does not require cell–cell contact, rather depending on the diffusion of signalling molecules from one cell to another after secretion.
Juxtacrine, that is, contact-dependent, cell–cell communication relies on gap junctions or other structures such as membrane nanotubes to pass signalling molecules directly between cells, without secretion into the extracellular space. Endocrine cell–cell communication represents intercellular communication whereby signalling molecules are secreted and travel long distances through extracellular fluids such as the blood plasma; typical mediators of this communication are hormones.
Cell communication is a process by which cells react to stimuli from their environment and also from themselves. In multicellular organisms, the dynamic coordination of cells, also called cell-cell communication (CCC), is involved in many biological processes, such as apoptosis and cell migration, and is consequently essential in homeostasis and disease.
CCC commonly focuses on protein-mediated interactions, most typically perceived as a secreted ligand binding to its corresponding plasma membrane receptor. However, this picture can be broadened to include secreted enzymes, extra-cellular matrix proteins, transporters, and interactions that require the physical contact between cells, such as cell-cell adhesion proteins and gap junctions.
Cell communication is further not independent of other processes, but the contrary, as external stimuli commonly elicit a downstream response. In the case of CCC, this is typically perceived as the induction of canonical pathways and downstream transcription factors in the cells receiving the signal, or receiver cells. Ultimately these external stimuli alter the function of receiver cells, and this alteration is further propagated via the subsequent interaction of these cells with their microenvironment.
Traditionally, the study of CCC required specialized in-situ biochemical assays, such proximity labelling proteomics, co-immunoprecipitation, and yeast two-hybrid screening. Yet, the rapid developments and dropping costs of transcriptomics data generation has enabled a paradigm shift away from focusing on which types of cells are present, but rather on the relationships between them. As a consequence, CCC inference from single-cell data is now becoming a routine approach, capable of providing a system-level hypotheses of intercellular crosstalk in vivo.
Typically, inference of Cell-cell communication from single cell RNA seq data involoves 6 steps:
Step 1 : Collection of samples or cells for transcriptomic analysis
Step 2 : Data preprocessing and generation of expression matrix
Step 3 : Collection of interacting proteins or ligand-receptors pairs
Step 4 : Filtering by interacting proteins or ligand-receptor pairs
Step 5 : Cell-cell interaction and communication analysis
Scoring function for cell cell communication
Step 6 : Results interpretation and visuzalization
A simple numerical cases:
Two primary inputs are used for quantifying communication scores: a preprocessed gene expression matrix (part a) and a list of interacting proteins to supervise the analysis (for example, ligand–receptor pairs) (part b). Then a communication score (CS) can be computed for every ligand–receptor pair in a given pair of cells. Here, we show how to perform these calculations for four core functions (parts c–f). These are applied to elucidate paracrine (parts c,d) and autocrine (parts e,f) communication. To assess cell–cell communication, a CS can be computed for each ligand–receptor pair by accounting for the presence of both partners if their expression is greater than a given threshold, which for demonstrative purposes was set arbitrarily to a value of 3.3 (part c), or by multiplying their expression values (part d). Similarly, the CS for each ligand–receptor pair can be the correlation score obtained from their expression across all cell types for autocrine communication (part e). To reveal non-autocrine interactions, the correlation can be computed across pairs of different cells. Particular signatures of each cell type can be extracted through analysing differentially expressed ligands and receptors. Using the cell type-specific differentially expressed genes, we can assign a binary CS and study the ligand–receptors used for autocrine communication (part f). In this example, autocrine communication is evaluated for cell type A by using its differentially expressed genes with respect to cell type B (cell type A-specific genes are located in the coloured quadrant). Analogously to the correlation score, for non-autocrine communication we would need to consider differentially expressed genes in each of the cell types or samples. For a given pair of cells, we can say that a communication pathway is active when the ligand is differentially expressed in one cell and its cognate receptor is differentially expressed in the other. FC, fold change.
Belows, we introduce cellphoneDB, the basic idea is:
To assess cellular crosstalk between different cell types, we used our repository in a statistical framework for inferring cell–cell communication networks from single-cell transcriptome data. We derived enriched receptor–ligand interactions between two cell types based on expression of a receptor by one cell type and a ligand by another cell type, using the droplet-based data. To identify the most relevant interactions between cell types, we looked for the cell-type specific interactions between ligands and receptors. Only receptors and ligands expressed in more than 10% of the cells in the specific cluster were considered.
We performed pairwise comparisons between all cell types. First, we randomly permuted the cluster labels of all cells 1,000 times and determined the mean of the average receptor expression level of a cluster and the average ligand expression level of the interacting cluster. For each receptor–ligand pair in each pairwise comparison between two cell types, this generated a null distribution. By calculating the proportion of the means which are ‘as or more extreme’ than the actual mean, we obtained a P value for the likelihood of cell-type specificity of a given receptor–ligand complex. We then prioritized interactions that are highly enriched between cell types based on the number of significant pairs, and manually selected biologically relevant ones. For the multi-subunit heteromeric complexes, we required that all subunits of the complex are expressed (using a threshold of 10%), and therefore we used the member of the complex with the minimum average expression to perform the random shuffling.
Citation:
The latest version: cellphoneDBv5
### better run in a conda enviroments
! pip install cellphonedb
import pandas as pd
import glob
import os
from IPython.display import HTML, display
from cellphonedb.utils import db_releases_utils
# Direct show figure in jupyter
%matplotlib inline
# Check database version
display(HTML(db_releases_utils.get_remote_database_versions_html()['db_releases_html_table']))
# -- Version of the databse
cpdb_version = 'v5.0.0'
# -- Path where the input files to generate the database are located
cpdb_target_dir = os.path.join('./data/cellphoneDB', cpdb_version)
# donwload the database
from cellphonedb.utils import db_utils
db_utils.download_database(cpdb_target_dir, cpdb_version)
Downloaded cellphonedb.zip into ./data/cellphoneDB/v5.0.0 Downloaded complex_input.csv into ./data/cellphoneDB/v5.0.0 Downloaded gene_input.csv into ./data/cellphoneDB/v5.0.0 Downloaded interaction_input.csv into ./data/cellphoneDB/v5.0.0 Downloaded protein_input.csv into ./data/cellphoneDB/v5.0.0 Downloaded uniprot_synonyms.tsv into ./data/cellphoneDB/v5.0.0/sources Downloaded transcription_factor_input.csv into ./data/cellphoneDB/v5.0.0/sources
The statistial method accepts 4 input files (3 mandatory).
cellphonedb.zip.metadata.tsv.normalised_log_counts.h5ad.The microenvs_file_path content will depend on the biological question that the researcher wants to answer.
In this example we are studying how cell-cell interactions change between a subset of immune cells and trophoblast cells as the trophoblast differentiate and invade the maternal uterus. This module will randomly permute the cluster labels of all cells whitin each microenvironement (
microenvs_file_path) 1,000 times (default) and determine the mean of the average receptor expression level in a cluster and the average ligand expression level in the interacting cluster. Then, we will obtain a P-value for the likelihood of cell-type specificity of a given receptor–ligand complex.
# set paths
cpdb_file_path = 'data/cellphoneDB/v5.0.0/cellphonedb.zip'
meta_file_path = 'data/cellphoneDB/metadata.tsv'
counts_file_path = 'data/cellphoneDB/normalised_log_counts.h5ad'
microenvs_file_path = 'data/cellphoneDB/microenvironment.tsv'
out_path = 'results/cellphoneDB/score/'
1) The metadata file is compossed of two columns:
metadata = pd.read_csv(meta_file_path, sep = '\t')
metadata.head(3)
| barcode_sample | cell_type | |
|---|---|---|
| 0 | AGCGATTAGTCTAACC-1_Pla_HDBR10917733 | B_cells |
| 1 | ATCCGTGAGGCTAGAA-1_Pla_Camb10714918 | B_cells |
| 2 | AGTAACCCATTAAAGG-1_Pla_HDBR10917733 | B_cells |
2) The counts files is a scanpy h5ad object. The dimensions and order of this object must coincide with the dimensions of the metadata file (i.e. must have the same number of cells in both files).
import anndata
adata = anndata.read_h5ad(counts_file_path)
adata.shape
(3312, 30800)
Check barcodes in metadata and counts are the same.
list(adata.obs.index).sort() == list(metadata['barcode_sample']).sort()
True
3) Micronevironments defines the cell types that belong to a a given microenvironment. CellphoneDB will only calculate interactions between cells that belong to a given microenvironment. In this file we are defining just one microenvionment.
microenv = pd.read_csv(microenvs_file_path,
sep = '\t')
microenv.head(3)
| cell_type | microenvironment | |
|---|---|---|
| 0 | PV MMP11 | Env1 |
| 1 | PV MYH11 | Env1 |
| 2 | PV STEAP4 | Env1 |
microenv.groupby('microenvironment', group_keys = False)['cell_type'] \
.apply(lambda x : list(x.value_counts().index))
microenvironment Env1 [PV MMP11, PV MYH11, PV STEAP4, EVT_1, EVT_2, ... Name: cell_type, dtype: object
The output of this method will be saved in output_path and also returned to the predefined variables.
from cellphonedb.src.core.methods import cpdb_analysis_method
cpdb_results = cpdb_analysis_method.call(
cpdb_file_path = cpdb_file_path, # mandatory: CellphoneDB database zip file.
meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
counts_file_path = counts_file_path, # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
microenvs_file_path = microenvs_file_path, # optional (default: None): defines cells per microenvironment.
score_interactions = True, # optional: whether to score interactions or not.
output_path = out_path, # Path to save results microenvs_file_path = None,
separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
threads = 5, # number of threads to use in the analysis.
threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
result_precision = 3, # Sets the rounding for the mean values in significan_means.
debug = False, # Saves all intermediate tables emplyed during the analysis in pkl format.
output_suffix = None # Replaces the timestamp in the output files by a user defined string in the (default: None)
)
[ ][CORE][10/11/25-12:11:17][INFO] [Non Statistical Method] Threshold:0.1 Precision:3 Reading user files... The following user files were loaded successfully: data/cellphoneDB/normalised_log_counts.h5ad data/cellphoneDB/metadata.tsv data/cellphoneDB/microenvironment.tsv [ ][CORE][10/11/25-12:11:20][INFO] Running Basic Analysis [ ][CORE][10/11/25-12:11:20][INFO] Limiting cluster combinations using microenvironments [ ][CORE][10/11/25-12:11:20][INFO] Building results [ ][CORE][10/11/25-12:11:21][INFO] Scoring interactions: Filtering genes per cell type..
100%|██████████| 42/42 [00:00<00:00, 330.13it/s]
[ ][CORE][10/11/25-12:11:21][INFO] Scoring interactions: Calculating mean expression of each gene per group/cell type..
100%|██████████| 42/42 [00:00<00:00, 1030.37it/s]
/home/zongxu/miniconda3/envs/spatialCC/lib/python3.11/site-packages/cellphonedb/utils/scoring_utils.py:138: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through chained assignment using an inplace method.
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.
matrix[index_name].replace(to_replace=id2name, inplace=True)
[ ][CORE][10/11/25-12:11:22][INFO] Scoring interactions: Calculating scores for all interactions and cell types..
100%|██████████| 81/81 [00:17<00:00, 4.70it/s]
Saved means_result to results/cellphoneDB/score/simple_analysis_means_result_11_10_2025_121139.txt Saved deconvoluted to results/cellphoneDB/score/simple_analysis_deconvoluted_11_10_2025_121139.txt Saved deconvoluted_percents to results/cellphoneDB/score/simple_analysis_deconvoluted_percents_11_10_2025_121139.txt Saved interaction_scores to results/cellphoneDB/score/simple_analysis_interaction_scores_11_10_2025_121139.txt
print(cpdb_results.keys())
dict_keys(['means_result', 'deconvoluted', 'deconvoluted_percents', 'interaction_scores'])
Most output files share common columns:
Means fields:
cpdb_results['means_result'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SC0A2DB962D | CDH1_integrin_a2b1_complex | simple:P12830 | complex:integrin_a2b1_complex | CDH1 | NaN | False | False | False | curated | ... | 0.433 | 0.232 | 0.0 | 0.0 | 0.198 | 0.225 | 0.215 | 0.203 | 0.225 | 0.472 |
| 1 | CPI-SC0B5CEA47D | COL10A1_integrin_a2b1_complex | simple:Q03692 | complex:integrin_a2b1_complex | COL10A1 | NaN | True | False | False | curated | ... | 0.000 | 0.049 | 0.0 | 0.0 | 0.015 | 0.043 | 0.032 | 0.020 | 0.042 | 0.289 |
2 rows × 94 columns
cpdb_results['interaction_scores'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SC0A2DB962D | CDH1_integrin_a2b1_complex | simple:P12830 | complex:integrin_a2b1_complex | CDH1 | NaN | False | False | False | curated | ... | 41.35 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 51.391 |
| 1 | CPI-SC0B5CEA47D | COL10A1_integrin_a2b1_complex | simple:Q03692 | complex:integrin_a2b1_complex | COL10A1 | NaN | True | False | False | curated | ... | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 |
2 rows × 94 columns
Deconvoluted fields:
cpdb_results['deconvoluted'].head(2)
| gene_name | uniprot | is_complex | protein_name | complex_name | id_cp_interaction | gene | B_cells | DC | EVT_1 | ... | dS1 | dS2 | dS3 | dT_cells | dT_regs | eEVT | fF1 | fF2 | iEVT | uSMC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| multidata_id | |||||||||||||||||||||
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS09B8977D7 | UBASH3B | 0.24 | 0.0 | 0.621 | ... | 0.118 | 0.224 | 0.162 | 0.025 | 0.124 | 1.182 | 0.055 | 0.0 | 1.402 | 0.338 |
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS05760BB78 | UBASH3B | 0.24 | 0.0 | 0.621 | ... | 0.118 | 0.224 | 0.162 | 0.025 | 0.124 | 1.182 | 0.055 | 0.0 | 1.402 | 0.338 |
2 rows × 49 columns
Deconvoluted percents fields:
cpdb_results['deconvoluted_percents'].head(2)
| gene_name | uniprot | is_complex | protein_name | complex_name | id_cp_interaction | gene | B_cells | DC | EVT_1 | ... | dS1 | dS2 | dS3 | dT_cells | dT_regs | eEVT | fF1 | fF2 | iEVT | uSMC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| multidata_id | |||||||||||||||||||||
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS09B8977D7 | UBASH3B | 0.125 | 0.0 | 0.698 | ... | 0.123 | 0.24 | 0.354 | 0.021 | 0.053 | 0.857 | 0.048 | 0.0 | 0.892 | 0.359 |
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS05760BB78 | UBASH3B | 0.125 | 0.0 | 0.698 | ... | 0.123 | 0.24 | 0.354 | 0.021 | 0.053 | 0.857 | 0.048 | 0.0 | 0.892 | 0.359 |
2 rows × 49 columns
The output of this method will be saved in output_path and also returned to the predefined variables.
The statisical method allows the user to downsample the data with the aim of speeding up the results (subsampling arguments). To this end, CellphoneDB employs a geometric sketching procedure (Hie et al. 2019) to preserve the structure of the data without losing information from lowly represented cells. For this tutorial, we have opted to manually downsample the count matrix and the metadata file accordingly.
# set path
cpdb_file_path = 'data/cellphoneDB/v5.0.0/cellphonedb.zip'
meta_file_path = 'data/cellphoneDB/metadata.tsv'
counts_file_path = 'data/cellphoneDB/normalised_log_counts.h5ad'
microenvs_file_path = 'data/cellphoneDB/microenvironment.tsv'
active_tf_path = 'data/cellphoneDB/active_TFs.tsv'
out_path = 'results/cellphoneDB/sig/'
from cellphonedb.src.core.methods import cpdb_statistical_analysis_method
cpdb_results = cpdb_statistical_analysis_method.call(
cpdb_file_path = cpdb_file_path, # mandatory: CellphoneDB database zip file.
meta_file_path = meta_file_path, # mandatory: tsv file defining barcodes to cell label.
counts_file_path = counts_file_path, # mandatory: normalized count matrix - a path to the counts file, or an in-memory AnnData object
counts_data = 'hgnc_symbol', # defines the gene annotation in counts matrix.
active_tfs_file_path = active_tf_path, # optional: defines cell types and their active TFs.
microenvs_file_path = microenvs_file_path, # optional (default: None): defines cells per microenvironment.
score_interactions = True, # optional: whether to score interactions or not.
iterations = 1000, # denotes the number of shufflings performed in the analysis.
threshold = 0.1, # defines the min % of cells expressing a gene for this to be employed in the analysis.
threads = 5, # number of threads to use in the analysis.
debug_seed = 42, # debug randome seed. To disable >=0.
result_precision = 3, # Sets the rounding for the mean values in significan_means.
pvalue = 0.05, # P-value threshold to employ for significance.
subsampling = False, # To enable subsampling the data (geometri sketching).
subsampling_log = False, # (mandatory) enable subsampling log1p for non log-transformed data inputs.
subsampling_num_pc = 100, # Number of componets to subsample via geometric skectching (dafault: 100).
subsampling_num_cells = 1000, # Number of cells to subsample (integer) (default: 1/3 of the dataset).
separator = '|', # Sets the string to employ to separate cells in the results dataframes "cellA|CellB".
debug = False, # Saves all intermediate tables employed during the analysis in pkl format.
output_path = out_path, # Path to save results.
output_suffix = None # Replaces the timestamp in the output files by a user defined string in the (default: None).
)
Reading user files... The following user files were loaded successfully: data/cellphoneDB/normalised_log_counts.h5ad data/cellphoneDB/metadata.tsv data/cellphoneDB/microenvironment.tsv data/cellphoneDB/active_TFs.tsv [ ][CORE][10/11/25-12:11:42][INFO] [Cluster Statistical Analysis] Threshold:0.1 Iterations:1000 Debug-seed:42 Threads:5 Precision:3 [ ][CORE][10/11/25-12:11:42][WARNING] Debug random seed enabled. Set to 42 [ ][CORE][10/11/25-12:11:42][INFO] Running Real Analysis [ ][CORE][10/11/25-12:11:42][INFO] Limiting cluster combinations using microenvironments [ ][CORE][10/11/25-12:11:42][INFO] Running Statistical Analysis
100%|██████████| 1000/1000 [00:25<00:00, 39.41it/s]
[ ][CORE][10/11/25-12:12:08][INFO] Building Pvalues result
[ ][CORE][10/11/25-12:12:08][INFO] Building results [ ][CORE][10/11/25-12:12:09][INFO] Scoring interactions: Filtering genes per cell type..
100%|██████████| 42/42 [00:00<00:00, 314.53it/s]
[ ][CORE][10/11/25-12:12:10][INFO] Scoring interactions: Calculating mean expression of each gene per group/cell type..
100%|██████████| 42/42 [00:00<00:00, 920.28it/s]
[ ][CORE][10/11/25-12:12:10][INFO] Scoring interactions: Calculating scores for all interactions and cell types..
100%|██████████| 81/81 [00:16<00:00, 4.95it/s]
Saved deconvoluted to results/cellphoneDB/sig/statistical_analysis_deconvoluted_11_10_2025_121227.txt Saved deconvoluted_percents to results/cellphoneDB/sig/statistical_analysis_deconvoluted_percents_11_10_2025_121227.txt Saved means to results/cellphoneDB/sig/statistical_analysis_means_11_10_2025_121227.txt Saved pvalues to results/cellphoneDB/sig/statistical_analysis_pvalues_11_10_2025_121227.txt Saved significant_means to results/cellphoneDB/sig/statistical_analysis_significant_means_11_10_2025_121227.txt Saved CellSign_active_interactions to results/cellphoneDB/sig/statistical_analysis_CellSign_active_interactions_11_10_2025_121227.txt Saved CellSign_active_interactions_deconvoluted to results/cellphoneDB/sig/statistical_analysis_CellSign_active_interactions_deconvoluted_11_10_2025_121227.txt Saved interaction_scores to results/cellphoneDB/sig/statistical_analysis_interaction_scores_11_10_2025_121227.txt
# help(cpdb_statistical_analysis_method.call)
Results are save into the predefined file and as a dictionary in the cpdb_results variable.
list(cpdb_results.keys())
['deconvoluted', 'deconvoluted_percents', 'means', 'pvalues', 'significant_means', 'CellSign_active_interactions', 'CellSign_active_interactions_deconvoluted', 'interaction_scores']
Most output files share common columns:
Pvalues fields:
cpdb_results['pvalues'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SC0A2DB962D | CDH1_integrin_a2b1_complex | simple:P12830 | complex:integrin_a2b1_complex | CDH1 | NaN | False | False | False | curated | ... | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 |
| 1 | CPI-SC0B5CEA47D | COL10A1_integrin_a2b1_complex | simple:Q03692 | complex:integrin_a2b1_complex | COL10A1 | NaN | True | False | False | curated | ... | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
2 rows × 94 columns
Means fields:
cpdb_results['means'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SC0A2DB962D | CDH1_integrin_a2b1_complex | simple:P12830 | complex:integrin_a2b1_complex | CDH1 | NaN | False | False | False | curated | ... | 0.433 | 0.232 | 0.0 | 0.0 | 0.198 | 0.225 | 0.215 | 0.203 | 0.225 | 0.472 |
| 1 | CPI-SC0B5CEA47D | COL10A1_integrin_a2b1_complex | simple:Q03692 | complex:integrin_a2b1_complex | COL10A1 | NaN | True | False | False | curated | ... | 0.000 | 0.049 | 0.0 | 0.0 | 0.015 | 0.043 | 0.032 | 0.020 | 0.042 | 0.289 |
2 rows × 94 columns
Significant means fields:
cpdb_results['significant_means'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2581 | CPI-SC012DB3194 | WNT10A_FZD5_LRP5 | simple:Q9GZT5 | complex:FZD5_LRP5 | WNT10A | NaN | True | False | False | curated | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2427 | CPI-SC0065F5B8A | WNT10A_FZD1_LRP6 | simple:Q9GZT5 | complex:FZD1_LRP6 | WNT10A | NaN | True | False | False | curated | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 rows × 95 columns
Interaction scores fields:
cpdb_results['interaction_scores'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SC0A2DB962D | CDH1_integrin_a2b1_complex | simple:P12830 | complex:integrin_a2b1_complex | CDH1 | NaN | False | False | False | curated | ... | 41.35 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 51.391 |
| 1 | CPI-SC0B5CEA47D | COL10A1_integrin_a2b1_complex | simple:Q03692 | complex:integrin_a2b1_complex | COL10A1 | NaN | True | False | False | curated | ... | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000 |
2 rows × 94 columns
CellSign fields:
cpdb_results['CellSign_active_interactions'].head(2)
| id_cp_interaction | interacting_pair | partner_a | partner_b | gene_a | gene_b | secreted | receptor_a | receptor_b | annotation_strategy | ... | GC|eEVT | eEVT|PV MMP11 | eEVT|PV MYH11 | eEVT|PV STEAP4 | eEVT|EVT_1 | eEVT|EVT_2 | eEVT|iEVT | eEVT|VCT_CCC | eEVT|GC | eEVT|eEVT | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | CPI-SS0CC966B71 | BTC_EGFR | simple:P35070 | simple:P00533 | BTC | EGFR | True | False | True | curated | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | CPI-SS076940BCD | LGALS3_MERTK | simple:P17931 | simple:Q12866 | LGALS3 | MERTK | True | False | True | curated | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 rows × 95 columns
Deconvoluted fields:
cpdb_results['deconvoluted'].head(2)
| gene_name | uniprot | is_complex | protein_name | complex_name | id_cp_interaction | gene | B_cells | DC | EVT_1 | ... | dS1 | dS2 | dS3 | dT_cells | dT_regs | eEVT | fF1 | fF2 | iEVT | uSMC | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| multidata_id | |||||||||||||||||||||
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS09B8977D7 | UBASH3B | 0.24 | 0.0 | 0.621 | ... | 0.118 | 0.224 | 0.162 | 0.025 | 0.124 | 1.182 | 0.055 | 0.0 | 1.402 | 0.338 |
| 1355 | UBASH3B | Q8TF42 | True | UBS3B_HUMAN | Dehydroepiandrosterone_bySTS | CPI-CS05760BB78 | UBASH3B | 0.24 | 0.0 | 0.621 | ... | 0.118 | 0.224 | 0.162 | 0.025 | 0.124 | 1.182 | 0.055 | 0.0 | 1.402 | 0.338 |
2 rows × 49 columns
This module allows to filter CellphoneDB results by specifying: cell types pairs, genes or specific interactions.
We can specify two list of cell types (query_cell_type_1and query_cell_type_2), the method will perform a pairwise comparison between the cell types defined in each list and will subset the interactions to those pairs of cells. Cell types within each list will not be paired. If we are interested in filtering interactions ocurring between a given cell to all the rest of cells in the dataset we can define query_cell_type_1 = 'All' and query_cell_type_2 = ['cellA', 'cellB', ...]. The argument genes allows the user to filter interactions in which a gene participates, the users can also define specific interactions based on the interaction name query_interactions.
In this example we are going to filter interactions in which a subset of trophoblast interact with the PV cells, furthermore, we will select those interactions in which the gene TGFBR1 participates and a specific interaction named CSF1_CSF1R.
This method filters the rows and columns of the significant_means file; NaN value correspond to interacting pairs found not significant, non-NaN value correspond to those interacting pairs found relevant.
from cellphonedb.utils import search_utils
search_results = search_utils.search_analysis_results(
query_cell_types_1 = ['EVT_1', 'EVT_2', 'GC', 'eEVT', 'iEVT'], # List of cells 1, will be paired to cells 2 (list or 'All').
query_cell_types_2 = ['PV MMP11', 'PV MYH11', 'PV STEAP4'], # List of cells 2, will be paired to cells 1 (list or 'All').
query_genes = ['TGFBR1'], # filter interactions based on the genes participating (list).
query_interactions = ['CSF1_CSF1R'], # filter intereactions based on their name (list).
significant_means = cpdb_results['significant_means'], # significant_means file generated by CellphoneDB.
deconvoluted = cpdb_results['deconvoluted'], # devonvoluted file generated by CellphoneDB.
interaction_scores = cpdb_results['interaction_scores'], # interaction score generated by CellphoneDB.
query_minimum_score = 50, # minimum score that an interaction must have to be filtered.
separator = '|', # separator (default: |) employed to split cells (cellA|cellB).
long_format = True, # converts the output into a wide table, removing non-significant interactions
query_classifications = ['Signaling by Transforming growth factor']
)
search_results.head()
| interacting_pair | partner_a | partner_b | gene_a | gene_b | directionality | classification | interacting_cells | significant_mean | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | GDF11_TGFR_AVR2B | simple:O95390 | complex:TGFR_AVR2B | GDF11 | NaN | Ligand-Receptor | Signaling by Growth differentiation factor | PV MMP11|EVT_2 | 0.105 |
| 4 | TGFB1_TGFBR3 | simple:P01137 | simple:Q03167 | TGFB1 | TGFBR3 | Ligand-Receptor | Signaling by Transforming growth factor | PV MMP11|EVT_2 | 0.590 |
| 5 | GDF11_TGFR_AVR2B | simple:O95390 | complex:TGFR_AVR2B | GDF11 | NaN | Ligand-Receptor | Signaling by Growth differentiation factor | PV MMP11|iEVT | 0.105 |
| 8 | TGFB1_TGFbeta_receptor1 | simple:P01137 | complex:TGFbeta_receptor1 | TGFB1 | NaN | Ligand-Receptor | Signaling by Transforming growth factor | PV MMP11|iEVT | 0.446 |
| 9 | TGFB1_TGFBR3 | simple:P01137 | simple:Q03167 | TGFB1 | TGFBR3 | Ligand-Receptor | Signaling by Transforming growth factor | PV MMP11|iEVT | 0.771 |
In this section of the tutorial we will make use of Kelvin's implementations (kt-plots tutorial) to perform some exploratory analysis of the results obtained from CellphoneDB. It is of great importance that the user is familiarized with the biological context of the data to obtain meaninful results from CellphoneDB results.
import os
import anndata as ad
import pandas as pd
import ktplotspy as kpy
import matplotlib.pyplot as plt
%matplotlib inline
Display as heatmap the number of significant interactions found by CellphoneDB between all of the cell pairs. Only cells found in the microenvironment will be represented in this plot.
kpy.plot_cpdb_heatmap(pvals = cpdb_results['pvalues'],
degs_analysis = False,
figsize = (5, 5),
title = "Sum of significant interactions")
<seaborn.matrix.ClusterGrid at 0x7fede08651d0>
Here we are plotting the interactions between the PVs and the trophoblasts that are mediated by TGFB2 and CSF1R.
kpy.plot_cpdb(
adata = adata,
cell_type1 = "PV MYH11|PV STEAP4|PV MMPP11",
cell_type2 = "EVT_1|EVT_2|GC|iEVT|eEVT|VCT_CCC",
means = cpdb_results['means'],
pvals = cpdb_results['pvalues'],
celltype_key = "cell_labels",
genes = ["TGFB2", "CSF1R"],
figsize = (10, 3),
title = "Interactions between\nPV and trophoblast",
max_size = 3,
highlight_size = 0.75,
degs_analysis = False,
standard_scale = True,
interaction_scores = cpdb_results['interaction_scores'],
scale_alpha_by_interaction_scores = True
)
Interactions can also be plotted grouped by pathway.
from plotnine import facet_wrap
p = kpy.plot_cpdb(
adata = adata,
cell_type1 = "PV MYH11",
cell_type2 = "EVT_1|EVT_2|GC|iEVT|eEVT|VCT_CCC",
means = cpdb_results['means'],
pvals = cpdb_results['pvalues'],
celltype_key = "cell_labels",
genes = ["TGFB2", "CSF1R", "COL1A1"],
figsize = (12, 8),
title = "Interactions between PV and trophoblast\ns grouped by classification",
max_size = 6,
highlight_size = 0.75,
degs_analysis = False,
standard_scale = True,
)
p + facet_wrap("~ classification", ncol = 1)
import sys
import pkg_resources
print("### Session Info ###\n")
loaded_pkgs = []
for mod_name, module in sys.modules.items():
file_path = getattr(module, "__file__", None)
if (
file_path
and "site-packages" in file_path
and not mod_name.startswith("_")
and "." not in mod_name
):
try:
version = pkg_resources.get_distribution(mod_name).version
loaded_pkgs.append((mod_name, version))
except Exception:
pass
for name, version in sorted(set(loaded_pkgs)):
print(f"{name:<15} v{version}")
/tmp/ipykernel_101942/219190044.py:2: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
### Session Info ### IPython v8.20.0 anndata v0.10.5.post1 asciitree v0.3.3 asttokens v2.4.1 brotli v1.1.0 cellphonedb v5.0.1 certifi v2025.7.14 cffi v1.16.0 chardet v5.2.0 charset_normalizer v3.3.2 cloudpickle v3.0.0 colorama v0.4.6 comm v0.2.1 cycler v0.12.1 cytoolz v0.12.2 dask v2024.1.1 debugpy v1.8.0 decorator v5.1.1 executing v2.0.1 fastcluster v1.2.6 fasteners v0.17.3 fbpca v1.0 geosketch v1.3 h5py v3.12.1 idna v3.6 importlib_metadata v7.0.1 ipykernel v6.29.0 jedi v0.19.1 jinja2 v3.1.3 joblib v1.3.2 jupyter_client v8.6.0 jupyter_core v5.7.1 kiwisolver v1.4.5 ktplotspy v0.3.3 llvmlite v0.43.0 lz4 v4.3.3 markupsafe v2.1.4 matplotlib v3.8.2 matplotlib_inline v0.1.6 mizani v0.14.3 msgpack v1.0.7 natsort v8.4.0 numba v0.60.0 numcodecs v0.12.1 numpy v1.26.4 numpy_groupies v0.11.3 packaging v23.2 pandas v2.2.0 parso v0.8.3 patsy v0.5.6 pexpect v4.9.0 pickleshare v0.7.5 platformdirs v4.1.0 plotnine v0.15.1 prompt_toolkit v3.0.42 psutil v5.9.8 ptyprocess v0.7.0 pure_eval v0.2.2 pyarrow v17.0.0 pycirclize v1.10.1 pycparser v2.21 pygments v2.17.2 pyparsing v3.1.1 pytz v2023.3.post1 requests v2.31.0 scipy v1.12.0 seaborn v0.13.2 six v1.16.0 stack_data v0.6.2 statsmodels v0.14.5 tabulate v0.9.0 tblib v3.0.0 threadpoolctl v3.2.0 toolz v0.12.1 tornado v6.3.3 tqdm v4.66.1 traitlets v5.14.1 typing_extensions v4.14.1 urllib3 v2.5.0 wcwidth v0.2.13 zarr v2.16.1 zipp v3.17.0
For chord plot, refer to:
https://github.com/moshi4/pyCirclize
https://jokergoo.github.io/circlize/